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Asymptotic formulae for likelihood-based tests of new physics 
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£■""». ■ Abstract 

We describe likelihood-based statistical tests for use in high energy physics for the 
discovery of new phenomena and for construction of confidence intervals on model pa- 
f^*) | rameters. We focus on the properties of the test procedures that allow one to account 

for systematic uncertainties. Explicit formulae for the asymptotic distributions of test 
statistics are derived using results of Wilks and Wald. We motivate and justify the use of 
a representative data set, called the "Asimov data set" , which provides a simple method 
to obtain the median experimental sensitivity of a search or measurement as well as 
S ■ fluctuations about this expectation. 
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1 Introduction 

In particle physics experiments one often searches for processes that have been predicted 
but not yet seen, such as production of a Higgs boson. The statistical significance of an 
observed signal can be quantified by means of a p- value or its equivalent Gaussian significance 
(discussed below). It is useful to characterize the sensitivity of an experiment by reporting 
the expected (e.g., mean or median) significance that one would obtain for a variety of signal 
hypotheses. 

Finding both the significance for a specific data set and the expected significance can 
involve Monte Carlo calculations that are computationally expensive. In this paper we in- 
vestigate approximate methods based on results due to Wilks [1] and Wald [2] by which one 
can obtain both the significance for given data as well as the full sampling distribution of the 
significance under the hypothesis of different signal models, all without recourse to Monte 
Carlo. In this way one can find, for example, the median significance and also a measure of 
how much one would expect this to vary as a result of statistical fluctuations in the data. 

A useful element of the method involves estimation of the median significance by replacing 
the ensemble of simulated data sets by a single representative one, referred to here as the 
"Asimov" data set£| In the past, this method has been used and justified intuitively (e.g., 
[U \5\). Here we provide a formal mathematical justification for the method, explore its 
limitations, and point out several additional aspects of its use. 

The present paper extends what was shown in Ref. [5j by giving more accurate formulas 
for exclusion significance and also by providing a quantitative measure of the statistical 
fluctuations in discovery significance and exclusion limits. For completeness some of the 
background material from [5j is summarized here. 

In Sec. [2] the formalism of a search as a statistical test is outlined and the concepts of 
statistical significance and sensitivity are given precise definitions. Several test statistics 
based on the profile likelihood ratio are defined. 

In Sec. [3j we use the approximations due to Wilks and Wald to find the sampling distri- 
butions of the test statistics and from these find p- values and related quantities for a given 
data sample. In Sec. H] we discuss how to determine the median significance that one would 
obtain for an assumed signal strength. Several example applications are shown in Sec. EJ 
and numerical implementation of the methods in the RooStats package is described in Sec.[H 
Conclusions are given in Sec. 

2 Formalism of a search as a statistical test 

In this section we outline the general procedure used to search for a new phenomenon in the 
context of a frequentist statistical test. For purposes of discovering a new signal process, 
one defines the null hypothesis, Hq, as describing only known processes, here designated as 
background. This is to be tested against the alternative H±, which includes both background 
as well as the sought after signal. When setting limits, the model with signal plus background 
plays the role of Hq, which is tested against the background-only hypothesis, H±. 

To summarize the outcome of such a search one quantifies the level of agreement of the 
observed data with a given hypothesis H by computing a p- value, i.e., a probability, under 



lr The name of the Asimov data set is inspired by the short story Franchise, by Isaac Asimov [3]- In it, 
elections are held by selecting the single most representative voter to replace the entire electorate. 



assumption of H, of finding data of equal or greater incompatibility with the predictions of H. 
The measure of incompatibility can be based, for example, on the number of events found in 
designated regions of certain distributions or on the corresponding likelihood ratio for signal 
and background. One can regard the hypothesis as excluded if its p-value is observed below 
a specified threshold. 

In particle physics one usually converts the p-value into an equivalent significance, Z, 
defined such that a Gaussian distributed variable found Z standard deviations abovqj its 
mean has an upper-tail probability equal to p. That is, 

z = $- 1 (i- P ), (1) 

where 3> _1 is the quantile (inverse of the cumulative distribution) of the standard Gaussian. 
For a signal process such as the Higgs boson, the particle physics community has tended 
to regard rejection of the background hypothesis with a significance of at least Z = 5 as 
an appropriate level to constitute a discovery. This corresponds to p = 2.87 x 10 -7 . For 
purposes of excluding a signal hypothesis, a threshold p- value of 0.05 (i.e., 95% confidence 
level) is often used, which corresponds to Z = 1.64. 

It should be emphasized that in an actual scientific context, rejecting the background-only 
hypothesis in a statistical sense is only part of discovering a new phenomenon. One's degree 
of belief that a new process is present will depend in general on other factors as well, such as 
the plausibility of the new signal hypothesis and the degree to which it can describe the data. 
Here, however, we only consider the task of determining the p-value of the background-only 
hypothesis; if it is found below a specified threshold, we regard this as "discovery". 

It is often useful to quantify the sensitivity of an experiment by reporting the expected 
significance one would obtain with a given measurement under the assumption of various 
hypotheses. For example, the sensitivity to discovery of a given signal process H\ could 
be characterized by the expectation value, under the assumption of H\, of the value of Z 
obtained from a test of Hq. This would not be the same as the Z obtained using Eq. (pQ) with 
the expectation of the p-value, however, because the relation between Z and p is nonlinear. 
The median Z and p will, however, satisfy Eq. (pQ) because this is a monotonic relation. 
Therefore in the following we will take the term 'expected significance' always to refer to the 
median. 

A widely used procedure to establish discovery (or exclusion) in particle physics is based 
on a frequentist significance test using a likelihood ratio as a test statistic. In addition to 
parameters of interest such as the rate (cross section) of the signal process, the signal and 
background models will contain in general nuisance parameters whose values are not taken 
as known a priori but rather must be fitted from the data. 

It is assumed that the parametric model is sufficiently flexible so that for some value of the 
parameters it can be regarded as true. The additional flexibility introduced to parametrize 
systematic effects results, as it should, in a loss in sensitivity. To the degree that the model is 
not able to reflect the truth accurately, an additional systematic uncertainty will be present 
that is not quantified by the statistical method presented here. 

To illustrate the use of the profile likelihood ratio, consider an experiment where for each 
selected event one measures the values of certain kinematic variables, and thus the resulting 



2 Some authors, e.g., Ref. [6j, have defined this relation using a two-sided fluctuation of a Gaussian variable, 
with a 5<r significance corresponding to p = 5.7 x 1CP 7 . We take the one-sided definition above as this gives 
Z = for p = 0.5. 



data can be represented as one or more histograms. Using the method in an unbinned analysis 
is a straightforward extension. 

Suppose for each event in the signal sample one measures a variable x and uses these 
values to construct a histogram n = (m, . . . , n^). The expectation value of nj can be written 

E[m] = usi + bi , (2) 

where the mean number of entries in the ith bin from signal and background are 



Si = s to t / f s {x;O s )dx, (3) 

Jhini 



bi = &tot / fb(x;O b )dx. (4) 

Jhini 

Here the parameter /i determines the strength of the signal process, with \x = corresponding 
to the background-only hypothesis and \i = 1 being the nominal signal hypothesis. The 
functions f s (x;6 s ) and fb(x;0),) are the probability density functions (pdfs) of the variable 
x for signal and background events, and 9 S and 6b represent parameters that characterize 
the shapes of pdfs. The quantities s to t and b to t are the total mean numbers of signal and 
background events, and the integrals in ([3]) and (JH) represent the probabilities for an event to 
be found in bin i. Below we will use 6 = (6 S , 6^, 6tot) to denote all of the nuisance parameters. 
The signal normalization s to t is not, however, an adjustable parameter but rather is fixed to 
the value predicted by the nominal signal model. 

In addition to the measured histogram n one often makes further subsidiary measurements 
that help constrain the nuisance parameters. For example, one may select a control sample 
where one expects mainly background events and from them construct a histogram of some 
chosen kinematic variable. This then gives a set of values m = (mi, . . . , tum) for the number 
of entries in each of the M bins. The expectation value of toj can be written 

E[mi] = u t {6) , (5) 

where the Ui are calculable quantities depending on the parameters 0. One often constructs 
this measurement so as to provide information on the background normalization parameter 
&tot and also possibly on the signal and background shape parameters. 

The likelihood function is the product of Poisson probabilities for all bins: 



N fue.-L h\ n i M n mk 



To test a hypothesized value of fi we consider the profile likelihood ratio 

A(,) - ^ ■ (7) 

£(£,») 

Here 6 in the numerator denotes the value of that maximizes L for the specified //, i.e., 
it is the conditional maximum-likelihood (ML) estimator of 6 (and thus is a function of ji). 



The denominator is the maximized (unconditional) likelihood function, i.e., fi and 6 are their 
ML estimators. The presence of the nuisance parameters broadens the profile likelihood as a 
function of \i relative to what one would have if their values were fixed. This reflects the loss 
of information about \i due to the systematic uncertainties. 

In many analyses, the contribution of the signal process to the mean number of events is 
assumed to be non-negative. This condition effectively implies that any physical estimator 
for [i must be non-negative. Even if we regard this to be the case, however, it is convenient 
to define an effective estimator fi as the value of \x that maximizes the likelihood, even this 
gives fi < (but providing that the Poisson mean values, /iSj + bi, remain nonnegative) . This 
will allow us in Sec. l3.ll to model fi as a Gaussian distributed variable, and in this way we can 
determine the distributions of the test statistics that we consider. Therefore in the following 
we will always regard /} as an effective estimator which is allowed to take on negative values. 

2.1 Test statistic t M = -21nA(/x) 

From the definition of A(/i) in Eq. ([7]), one can see that < A < 1, with A near 1 implying good 
agreement between the data and the hypothesized value of \i. Equivalently it is convenient 
to use the statistic 

t„ = -21nA(/x) (8) 

as the basis of a statistical test. Higher values of t^ thus correspond to increasing incompat- 
ibility between the data and [i. 

We may define a test of a hypothesized value of [i by using the statistic t^ directly 
as measure of discrepancy between the data and the hypothesis, with higher values of t M 
correspond to increasing disagreement. To quantify the level of disagreement we compute 
the p- value, 



P» = / /(*»<&„• (9) 



/.i,obs 



where i^obs is the value of the statistic t^ observed from the data and /(i M |^) denotes the 
pdf of t^ under the assumption of the signal strength fi. Useful approximations for this and 
other related pdfs are given in Sec. 13.31 The relation between the p-value and the observed 
tfj, and also with the significance Z are illustrated in Fig. [TJ 

When using the statistic t^, a data set may result in a low p- value in two distinct ways: 
the estimated signal strength fi may be found greater or less than the hypothesized value \x. 
As a result, the set of fj, values that are rejected because their p- values are found below a 
specified threshold a may lie to either side of those values not rejected, i.e., one may obtain 
a two-sided confidence interval for /i. 

2.2 Test statistic ty. for fi > 

Often one assumes that the presence of a new signal can only increase the mean event rate 
beyond what is expected from background alone. That is, the signal process necessarily has 
[i > 0, and to take this into account we define an alternative test statistic below called t^. 
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Figure 1: (a) Illustration of the relation between the p-value obtained from an observed value of 



the test statistic t^. (b) The standard normal distribution (p(x) = (1/\/2tt) exp(— x 2 /2) showing the 
relation between the significance Z and the p-value. 



For a model where [i > 0, if one finds data such that jl < 0, then the best level of 
agreement between the data and any physical value of /i occurs for [i = 0. We therefore 
define 



\(p) 



£(0,§(0)) 



A>0, 
A<0 



(10) 



Here 0(0) and 0(fi) refer to the conditional ML estimators of given a strength parameter 
of or fi, respectively. 

The variable \(/i) can be used instead of A(/i) in Eq. ([8]) to obtain the corresponding test 
statistic, which we denote t^. That is, 



-21nA(/i) 



2]n Lfr,0(/j)) 
L(0,f(0)) 



A<o, 



(ii) 



As was done with the statistic t^, one can quantify the level of disagreement between the 
data and the hypothesized value of fi with the p-value, just as in Eq. ([9]). For this one needs 



the distribution of t^, an approximation of which is given in Sec. [3? 

Also similar to the case of t^, values of [i both above and below jl may be excluded by a 
given data set, i.e., one may obtain either a one-sided or two-sided confidence interval for fi. 
For the case of no nuisance parameters, the test variable t^ is equivalent to what is used in 
constructing confidence intervals according to the procedure of Feldman and Cousins [8] . 



2.3 Test statistic q for discovery of a positive signal 

An important special case of the statistic i^ described above is used to test /j = in a class 
of model where we assume fi > 0. Rejecting the [i = hypothesis effectively leads to the 
discovery of a new signal. For this important case we use the special notation qo = to. Using 
the definition (llip with [i = one finds 



-21nA(0) A>0, 
Qo = { (12) 

o A < o , 

where A(0) is the profile likelihood ratio for \i = as defined in Eq. ([7|). 

We may contrast this to the statistic £o> i- e -> Eq. (JSJ) , used to test \x = 0. In that case 
one may reject the fj, = hypothesis for either an upward or downward fluctuation of the 
data. This is appropriate if the presence of a new phenomenon could lead to an increase or 
decrease in the number of events found. In an experiment looking for neutrino oscillations, 
for example, the signal hypothesis may predict a greater or lower event rate than the no- 
oscillation hypothesis. 

When using qo, however, we consider the data to show lack of agreement with the 
background-only hypothesis only if p, > 0. That is, a value of /} much below zero may 
indeed constitute evidence against the background-only model, but this type of discrepancy 
does not show that the data contain signal events, but rather points to some other systematic 
error. For the present discussion, however, we assume that the systematic uncertainties are 
dealt with by the nuisance parameters 0. 

If the data fluctuate such that one finds fewer events than even predicted by background 
processes alone, then jl < and one has qo = 0. As the event yield increases above the 
expected background, i.e., for increasing jl, one finds increasingly large values of qo, corre- 
sponding to an increasing level of incompatibility between the data and the [i = hypothesis. 

To quantify the level of disagreement between the data and the hypothesis of \i = using 
the observed value of qo we compute the p- value in the same manner as done with t^, namely, 

rod 

P0= f(qo\0)dq . (13) 

•"''JO.obs 

Here f(qo\0) denotes the pdf of the statistic qo under assumption of the background-only 
(/i = 0) hypothesis. An approximation for this and other related pdfs are given in Sec. 13.51 

2.4 Test statistic q^ for upper limits 

For purposes of establishing an upper limit on the strength parameter fi, we consider two 
closely related test statistics. First, we may define 

f-21nA(/i) ft< ix, 
to = n - ( 14 ) 

where X(p) is the profile likelihood ratio as defined in Eq. ([7]). The reason for setting q^ = 
for jj, > fi is that when setting an upper limit, one would not regard data with p, > \i as 
representing less compatibility with \x than the data obtained, and therefore this is not taken 
as part of the rejection region of the test. From the definition of the test statistic one sees that 
higher values of q^ represent greater incompatibility between the data and the hypothesized 
value of fx. 

One should note that go is not simply a special case of g M with fj, = 0, but rather has a 
different definition (see Eqs. (fT2|) and Q14p ). That is, qo is zero if the data fluctuate downward 
(jl < 0), but q^ is zero if the data fluctuate upward (jl > fi). With that caveat in mind, we will 



often refer in the following to q^ with the idea that this means either go or Qu. as appropriate 
to the context. 

As with the case of discovery, one quantifies the level of agreement between the data and 
hypothesized fi with p-value. For, e.g., an observed value Q M)0 bs) one has 

/•oo 

Pn= f{%\^)dq^ , (15) 

which can be expressed as a significance using Eq. (pQ). Here f{q^\^) is the pdf of q^ assuming 
the hypothesis \x. In Sec. 13.61 we provide useful approximations for this and other related 
pdfs. 

2.5 Alternative test statistic q^ for upper limits 

For the case where one considers models for which [i > 0, the variable A(/i) can be used 
instead of A(//) in Eq. ([Til) to obtain the corresponding test statistic, which we denote q^. 
That is, 



-2 In A(ju) p, < n 
/} > jj, 



L(o,e(o)) 

01 L(u,6(u)) n ^ ~ ^ (16) 

jx > [i . 



We give an approximation for the pdf f{q^\^') in Sec. 13.71 

In numerical examples we have found that the difference between the tests based on q^ 
(Eq. ([H]) ) and <j M usually to be negligible, but use of q^ leads to important simplifications. 
Furthermore, in the context of the approximation used in Sec. [3j the two statistics are equiv- 
alent. That is, assuming the approximations below, q^ can be expressed as a monotonic 
function of q^ and thus they lead to the same results. 

3 Approximate sampling distributions 

In order to find the p-value of a hypothesis using Eqs. (113[) or ([15D we require the sampling 
distribution for the test statistic being used. In the case of discovery we are testing the 
background-only hypothesis (fx = 0) and therefore we need f(qo\0), where qo is defined by 
Eq. ([12]) . When testing a nonzero value of /j, for purposes of finding an upper limit we need 
the distribution /(^|/i) where q^ is defined by Eq. ([14]) . or alternatively we require the pdf 
of the corresponding statistic q^ as defined by Eq. ([T6]) . In this notation the subscript of q 
refers to the hypothesis being tested, and the second argument in /(^|/i) gives the value of 
\x assumed in the distribution of the data. 

We also need the distribution f(q^\fJ,') with li ^ it! to find what significance to expect and 
how this is distributed if the data correspond to a strength parameter different from the one 
being tested. For example, it is useful to characterize the sensitivity of a planned experiment 
by quoting the median significance, assuming data distributed according to a specified signal 
model, with which one would expect to exclude the background-only hypothesis. For this one 
would need /(qolfJ-'), usually with // = 1. From this one can find the median qo, and thus the 



median discovery significance. When considering upper limits, one would usually quote the 
value of \i for which the median p-value is equal to 0.05, as this gives the median upper limit 
on fjL at 95% confidence level. In this case one would need /(^|0) (or alternatively f(q^\0)). 

In Sec. 13. II we present an approximation for the profile likelihood ratio, valid in the large 
sample limit. This allows one to obtain approximations for all of the required distributions, 
which are given in Sections 13.31 through 13.61 The approximations become exact in the large 
sample limit and are in fact found to provide accurate results even for fairly small sample sizes. 
For very small data samples one always has the possibility of using Monte Carlo methods to 
determine the required distributions. 

3.1 Approximate distribution of the profile likelihood ratio 

Consider a test of the strength parameter /x, which here can either be zero (for discovery) or 
nonzero (for an upper limit), and suppose the data are distributed according to a strength 
parameter //. The desired distribution /(q^l//) can be found using a result due to Wald [2], 
who showed that for the case of a single parameter of interest, 

- 2 In X(p) = (V—A- + 0(1/ VN) . (17) 

Here p, follows a Gaussian distribution with a mean fj,' and standard deviation a, and N 
represents the data sample size. The standard deviation a of fi is obtained from the covariance 
matrix of the estimators for all the parameters, Vij = cov[#j, 9j], where here the 9i represent 
both \i as well as the nuisance parameters (e.g., take 9q = l^-, so f 2 = ^oo)- in the large- 
sample limit, the bias of ML estimators in general tend to zero, in which case we can write 
the inverse of the covariance matrix as 



Vrr 1 = -E 



d 2 lnL 
dOidOj 



(18) 



where the expectation value assumes a strength parameter \j! '. The approximations presented 
here are valid to the extent that the 0(l/y/N) term can be neglected, and the value of a can 
be estimated, e.g., using Eq. (fTSj) . In Sec. 13.21 we present an alternative way to estimate a 
which lends itself more directly to determination of the median significance. 

If fi is Gaussian distributed and we neglect the 0(1/ \N) term in Eq. (fTTJ) . then one can 
show that the statistic tn = — 21nA(//) follows a noncentral chi-square distribution for one 
degree of freedom (see, e.g., [9]), 



f(t,;A) ' ' 



where the noncentrality parameter A is 



exp(--(vV + ^)j +expU-(^-v / A V 



(19) 



A = ^#l!. (20) 

For the special case (j! = fi one has A = and —2 In X(fj,) approaches a chi-square distribution 
for one degree of freedom, a result shown earlier by Wilks pQ. 



The results of Wilks and Wald generalize to more than one parameter of interest. If 
the parameters of interest can be explicitly identified with a subset of the parameters r = 
(#i, . . . ,0 r ), then the distribution of —2lnX(6 r ) follows a noncentral chi-square distribution 
for r-degrees of freedom with noncentrality parameter 

A r = j^i^-O^V^iOj-O',), (21) 

where V { ~ is the inverse of the submatrix one obtains from restricting the full covariance 
matrix to the parameters of interest. The full covariance matrix is given from inverting 
Eq. (fTKj) . and we show an efficient way to calculate it in Sec. 13.21 

3.2 The Asimov data set and the variance of ft 

Some of the formulae given require the standard deviation a of ft, which is assumed to follow 
a Gaussian distribution with a mean of \J . Below we show two ways of estimating a, both of 
which are closely related to a special, artificial data set that we call the "Asimov data set" . 

We define the Asimov data set such that when one uses it to evaluate the estimators for 
all parameters, one obtains the true parameter values. Consider the likelihood function for 
the generic analysis given by Eq. ([6]). To simplify the notation in this section we define 



v; 



fi'st + bi . (22) 



Further let 9q = fi represent the strength parameter, so that here di can stand for any of the 
parameters. The ML estimators for the parameters can be found by setting the derivatives 
of In L with respect to all of the parameters equal to zero: 

<91nL x-^ (m \ dvi x-^ frrn \ dm 



This condition holds if the Asimov data, rii a and m, a, are equal to their expectation values: 



n iA = Elm] = Vi = n'si{9) + bi{9) , (24) 

mi,A = E[mi] = Ui(0) . (25) 

Here the parameter values represent those implied by the assumed distribution of the data. 
In practice, these are the values that would be estimated from the Monte Carlo model using 
a very large data sample. 

We can use the Asimov data set to evaluate the "Asimov likelihood" La and the cor- 
responding profile likelihood ratio Aa- The use of non- integer values for the data is not a 
problem as the factorial terms in the Poisson likelihood represent constants that cancel when 
forming the likelihood ratio, and thus can be dropped. One finds 

L A (n,0) L A (n,e) 
L A (jjt,9) L A {n',8) 

10 



where the final equality above exploits the fact that the estimators for the parameters are 
equal to their hypothesized values when the likelihood is evaluated with the Asimov data set. 

A standard way to find a is by estimating the matrix of second derivatives of the log- 
likelihood function (cf. Eq. (I18j) ) to obtain the inverse covariance matrix V , inverting to 
find V, and then extracting the element Voo corresponding to the variance of jX. The second 
derivative of In L is 



d 2 lnL 
dOjdOk 



N 

E 



u { ) d6 3 d6 k 86 'j 0d k uf 



M 



+ E 



i=\ 



mi \ d 2 Ui dui dui rrii 
Ui J dOjdOk dOj ddk u 2 



(27) 



From (|27p one sees that the second derivative of InL is linear in the data values rii and mj. 
Thus its expectation value is found simply by evaluating with the expectation values of the 
data, which is the same as the Asimov data. One can therefore obtain the inverse covariance 
matrix from 



V: 



Jk 



-E 



S 2 lnL 



ddjdd k 



^ In L A 
" dOjdO k 



y--^ dvi dui 1 ^ dui dui 1 

^ WjWkVi + ^ WjWk^Ti 



(28) 



In practice one could, for example, evaluate the the derivatives of In La numerically, use this 
to find the inverse covariance matrix, and then invert and extract the variance of ft. One can 
see directly from Eq. (|28p that this variance depends on the parameter values assumed for 
the Asimov data set, in particular on the assumed strength parameter //, which enters via 
Eq. d22|). 



Another method for estimating a (denoted oa. in this section to distinguish it from the 
approach above based on the second derivatives of In L) is to find find the value that is neces- 
sary to recover the known properties of — Aa (/•*)• Because the Asimov data set corresponding 
to a strength // gives p, = fj,', from Eq. (fT7]) one finds 



2 In A A (ji) 



(m-mO 



/\2 



A 



o* 



(29) 



That is, from the Asimov data set one obtains an estimate of the noncentrality parameter A 
that characterizes the distribution f(qu\fJ,')- Equivalently, one can use Eq. (|29|) to obtain the 
variance a 2 which characterizes the distribution of /2, namely, 



4 






/\2 



(30) 



where q^a = — 21n \x(fj,). For the important case where one wants to find the median 
exclusion significance for the hypothesis \i assuming that there is no signal, then one has 



A 






(31) 
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and for the modified statistic q^ the analogous relation holds. For the case of discovery where 
one tests \i = one has 

oi = f-. (32) 

The two methods for obtaining a and A — from the Fisher information matrix or from 
Qfi,A — are not identical, but were found to provide similar results in examples of of practical 
interest. In several cases that we considered, the distribution based on <ta provided a better 
approximation to the true sampling distribution than the standard approach based on the 
Fisher information matrix, leading to the conjecture that it may effectively incorporate some 
higher-order terms in Eq. (|17p . 

This can be understood qualitatively by noting that under assumption of the Wald ap- 
proximation, the test statistics qo, q^ and q^ are monotonically related to fi, and therefore 
their median values can be found directly by using the median of p, which is p' . But mono- 
tonicity is a weaker condition than the full Wald approximation. That is, even if higher-order 
terms are present in Eq. (|17p . they will not alter the distribution's median as long as they 
do not break the monotonicity of the relation between the test statistic and /2. If one uses 
<7A one obtains distributions with medians given by the corresponding Asimov values, go,A 
or g^A' an d these values will be correct to the extent that monotonicity holds. 

3.3 Distribution of t^ 

Consider first using the statistic t^ = — 21nA(/i) of Sec. 12.11 as the basis of the statistical 
test of a hypothesized value of fi. This could be a test of fi = for purposes of establishing 
existence of a signal process, or non-zero values of p for purposes of obtaining a confidence 
interval. To find the p- value p^, we require the pdf /(t^lfi), and to find the median p- value 
assuming a different strength parameter we will need f(t^\p'). 

The pdf f(t [mI/j,') is given by Eq. (fT9|) . namely, 



/(*>') 



1 1 



exp 




2V^V2^ 
The special case \x = // is simply a chi-square distribution for one degree of freedom: 



(33) 



/(*» = -L-L e -W2 (34) 

V27T VV 



The cumulative distribution of tu assuming /j,' is 

f(W) = $ (yr» + ^) + $ (yr» - ^) - 1 , (35) 

where $ is the cumulative distribution of the standard (zero mean, unit variance) Gaussian. 
The special case fi = fi' is therefore 



F(t» = 2$ (JQ - 1 , (36) 
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The p- value of a hypothesized value of [i for an observed value £„ is therefore 



p M = l-F(t» = 2(l-$(v^)) 

and the corresponding significance is 



(37) 



Z^ = $-\l - p M ) = CD" 1 (2<D (VQ - 1) • (38) 

If the p- value is found below a specified threshold a (often one takes a = 0.05), then the 
value of \i is said to be excluded at a confidence level (CL) of 1 — a. The set of points not 
excluded form a confidence interval with CL = 1 — a. Here the endpoints of the interval can 
be obtained simply by setting p ii = a and solving for \i. Assuming the Wald approximation 
fT7|) and using Eq. ([37]) one finds 



/V/io = A±0"$ 1 ( 1 ~ a / 2 ) • 



(39) 



One subtlety with this formula is that a itself depends at some level on \x. In practice to find 
the upper and lower limits one can simply solve numerically to find those values of \i that 
satisfy p^ = a. 

3.4 Distribution of i„ 



Assuming the Wald approximation, the statistic in as defined by Eq. ([lip can be written 

1,2 *£ A < o , 
A>o. 



t, 



HZ 

73" 



(m-m) 2 



From this the pdf f (i „](/,' ) is found to be 



(40) 



f(W 



1 1 1 

27^77? 



exp 



2 I V V + 



1_1 1_ 



exp 



+ < 



a 



1 I Jf — M=i£ 

2 I V /* 



1 



2tt(2^/o-) 



exp 



2 (2/Vo-)* 



U — 2UU 

— 7i — 



ip < V 2 /a 2 , 



2 /^2 



t M > n z /a 



(41) 



(42) 



The special case H = p' is therefore 



l-Jie-e"^/ 2 



/(<>') 



2 V2T,/L e 



2ir(2fi/ff) 



exp 



2 (2/V<7)* 



t„ < A* 2 /* 2 
^ > /i 2 /o- 2 



(43) 
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The corresponding cumulative distribution is 



F(t>') = $ ./£ + 



A* — A* 

(7 



$ ( ^ - ^ I - I 






2 /„2 



<u < ^/0- : 



2 /^2 



1 t M > / U7cr : 



(44) 



For p, = p! this is 



*X*» 




ip < p 2 /<j 2 , 
1 if, > p 2 /a 2 . 



(45) 



The p-value of the hypothesized /i is given by one minus the cumulative distribution, under 
assumption of the parameter /j,, 

p M = l-F(t». (46) 

The corresponding significance is Z^ = <5 _1 (1 —pu)- 

A confidence interval for p at confidence level CL = 1 — a can be constructed from the set 
fj, values for which the p-value is not less than a. To find the endpoints of this interval, one 
can set p^ from Eq. ()46p equal to a and solve for fi. In general this must be done numerically. 
In the large sample limit, i.e., assuming the validity of the asymptotic approximations, these 
intervals correspond to the limits of Feldman and Cousins [8\ for the case where physical 
range of the parameter p is p > 0. 



3.5 Distribution of q (discovery) 

Assuming the validity of the approximation (|17p . one has — 21nA(0) = p 2 /a 2 . From the 
definition (fl~2|) of qo , we therefore have 



?o 



p? jo 2 p > , 
p < , 



(47) 



where p follows a Gaussian distribution with mean y! and standard deviation a. From this 
one can show that the pdf of qo has the form 



For the special case of // = 0, this reduces to 



If/- p'^' 



(48) 



/(*|0) = ^(*) + ^^ 



-So/2 



(49) 



14 



That is, one finds a mixture of a delta function at zero and a chi-square distribution for one 
degree of freedom, with each term having a weight of 1/2. In the following we will refer to 
this mixture as a half chi-square distribution or 5X1- 

Prom Eq. (|48|) the corresponding cumulative distribution is found to be 

F(q \n') = <f>(yqt-^) ■ (50) 

The important special case //' = is therefore simply 

F(<fo|0) = *(V«b) • (51) 

The p- value of the \x = hypothesis (see Eq. (f!3j) ) is 

p = l-F(go|0), (52) 

and therefore using Eq. (pQ) for the significance one obtains the simple formula 

Z = $-\l - p ) = ^% . (53) 

3.6 Distribution of g M (upper limits) 

Assuming the validity of the Wald approximation, we can write the test statistic used for 
upper limits, Eq. (fl4l) as 

%= { (54) 

[0 jx> n , 

where \x as before follows a Gaussian centred about // with a standard deviation a. 
The pdf f(qfj,\fjf) is found to be 



1 / /\ 2 

1 ( r— V- V 



(55) 



so that the special case [i = // is a half-chi-square distribution: 

/(<?» = £%.) + ^4=4= e ~ W2 . (56) 

The cumulative distribution is 

F(g>') = $(W-^^) , (57) 

and the corresponding special case fjf = /i is thus the same as what was found for qo, namely, 

F(g» = *(v«£) • (58) 
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The p- value of the hypothesized ji is 



and therefore the corresponding significance is 



(59) 



Z / , = $- i (l-p /i ) = v ^. 



(60) 



As with the statistic i« above, if the p-value is found below a specified threshold a (often 
one takes a = 0.05), then the value of [i is said to be excluded at a confidence level (CL) of 
1 — a. The upper limit on fi is the largest /i with p^ < a. Here this can be obtained simply 
by setting p fJL = a and solving for fi. Using Eqs. (J54"]) and ([59]) one finds 



Mup = jj, + a$ 1 (l -a) . 



(61) 



For example, a = 0.05 gives $ _1 (1 — a) = 1.64. Also as noted above, a depends in general on 
the hypothesized [i. Thus in practice one may find the upper limit numerically as the value 
of fi for which Pu = a. 

3.7 Distribution of q^ (upper limits) 

Using the alternative statistic q^ defined by Eq. (fT6| and assuming the Wald approximation 
we find 



4 



2[i(i 



A<o 



9/i 



The pdf f{qu\n') is found to be 



^# 0</i</x 

p, > ji . 



(62) 



/(?>') 



<T> ( ^-^ ) 8% 



a 



1 1 1 



+ 



1 

v / 2^(2^/<t) 

The special case /i = // is therefore 



exp 



exp 



iO 



M-M 



/\2' 



'2 (2^p 



< q„ < ^/<J 2 



(63) 



/(&!/*) = o<5(<?m) + 



1^ ^ e -<W2 



exp 



/2^(2^/cr) 

The corresponding cumulative distribution is 



1 (g M +M 2 /<x 2 ) 2 
' 2 (2/V<r) a 



< % < fi 2 /a 2 , 



<7m > M 7 



2 /„2 



<7 . 



(64) 
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^>') - <! (65) 



The special case /i = /J,' is 



$(v^) < «? M < ^ 2 , 

^(<?» = < (66) 

1 * {H^-) % > ^/- 2 . 

The p- value of the hypothesized fj, is as before given by one minus the cumulative distribution, 

Pll = l-F(q^), (67) 

and therefore the corresponding significance is 

% 0<q^< fi 2 /a 2 , 

Z„ = { (68) 

g M +/» 2 /a- 2 - „2/„2 



"277° 



9/x > /" /o- • 



As when using q^, the upper limit on \i at confidence level 1 — a is found by setting p^ = a 
and solving for //, which reduces to the same result as found when using q^, namely, 

H up = fi + a^- 1 (l-a). (69) 

That is, to the extent that the Wald approximation holds, the two statistics q^ and q^ lead 
to identical upper limits. 

3.8 Distribution of — 2ln(L s+b /L b ) 

Many analyses carried out at the Tevatron Collider (e.g., |10| ) involving searches for a new 
signal process have been based on the statistic 

q = -2ln^, (70) 

where L s+b is the likelihood of the nominal signal model and L b is that of the background- 
only hypothesis. That is, the s + b corresponds to having the strength parameter [i = 1 and 
L b refers to fi = 0. The statistic q can therefore be written 

q = -2 In L ^ = *' 0(1)) = -2 In A(l) + 2 In A(0) . (71) 

L(jjl = 0,9(0)) 

Assuming the validity of the Wald approximation (|17p , q is given by 

, = (^a!_4 = iz|A ] (72) 

cH a z a 1 
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where as previously a 2 is the variance of fi. As (i, follows a Gaussian distribution, the distri- 
bution of q is also seen to be Gaussian, with a mean value of 



E[q] = l ~^ (73) 



and a variance of 



V[q] = ~ 2 ■ (74) 

That is, the standard deviation of q is a q = 2/er, where the standard deviation of jl, a, can 
be estimated, e.g., using the second derivatives of the log-likelihood function as described in 
Sec. 13.11 or with the methods discussed in Sec. 13.21 Recall that in general a depends on the 
hypothesized value of fi; here we will refer to these as ab and a s+ b for the \i = and fj, = 1 
hypotheses, respectively. 

From Eq. (j73[) one sees that for the s + b hypothesis (fi = 1) the values of q tend to be 
lower, and for the b hypothesis (fj, = 0) they are higher. Therefore we can find the p-values 
for the two hypothesis from 



Ps +b = r M* + b)dq = l-$ f gobs + 1/as+b ) , (75) 

w:>'^-*(t)' ™ 

where we have used Eqs. (|T3|) and (|74p for the mean and variance of q under the b and s + b 
hypotheses. 

The p-values from Eqs. (I75p and (I76p incorporate the effects of systematic uncertainties 
to the extent that these are connected to the nuisance parameters 6. In analyses done at the 
Tevatron such as in Ref. |10| . these effects are incorporated into the distribution of q in a 
different but largely equivalent way. There, usually one treats the control measurements that 
constrain the nuisance parameters as fixed, and to determine the distribution of q one only 
generates the main search measurement (i.e., what corresponds in our generic analysis to the 
histogram n). The effects of the systematic uncertainties are taken into account by using the 
control measurements as the basis of a Bayesian prior density tt(6), and the distribution of q 
is computed under assumption of the Bayesian model average 

f(q) = Jf(q\e)ir(e)d9. (77) 

The prior pdf tt(6) used in Eq. (|77p would be obtained from some measurements char- 
acterized by a likelihood function Lg(0), and then used to find the prior ir(0) using Bayes' 
theorem, 

tt(0) oc L (6>)vr o (6») . (78) 

Here ttq(6) is the initial prior for that reflected one's knowledge before carrying out the 
control measurements. In many cases this is simply take as a constant, in which case it(0) is 
simply proportional to Lg(0). 
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In the approach of this paper, however, all measurements are regarded as part of the 
data, including control measurements that constrain nuisance parameters. That is, here to 
generate a data set by MC means, for a given assumed point in the model's parameter space, 
one simulates both the control measurements and the main measurement. Although this is 
done for a specific value of 0, in the asymptotic limit the distributions required for computing 
the p-values ([75]) and ([76]) are only weakly dependent on 6 to the extent that this can affect 
the standard deviation a q . By contrast, in the Tevatron approach one generates only the 
main measurement with data distributed according to the averaged model (|77p . In the case 
where the nuisance parameters are constrained by Gaussian distributed estimates and the 
initial prior ttq{0) is taken to be constant, the two methods are essentially equivalent. 

Assuming the Wald approximation holds, the statistic q as well as go from Eq. (|12p . 
q^ from Eq. (114p and q~u from Eq. (116j) are all monotonic functions of ft, and therefore all 
are equivalent to £i in terms of yielding the same statistical test. If there are no nuisance 
parameters, then the Neyman-Pearson lemma (see, e.g., [7]) states that the likelihood ratio 
Lg+b/Lb (or equivalently q) is an optimal test statistic in the sense that it gives the maximum 
power for a test of the background-only hypothesis with respect to the alternative of signal 
plus background (and vice versa). But if the Wald approximation holds, then qo and q^ 
lead to equivalent tests and are therefore also optimal in the Neyman-Pearson sense. If the 
nuisance parameters are well constrained by control measurements, then one expects this 
equivalence to remain approximately true. 

Finally, note that in many analyses carried out at the Tevatron, hypothesized signal 
models are excluded based not on whether the p- value p s +b from Eq. (J75D is less than a given 
threshold a, but rather the ratio CL S = p s+ {,/(l — Pb) is compared to a. We do not consider 
this final step here; it is discussed in, e.g., Ref. [12] . 

4 Experimental sensitivity 

To characterize the sensitivity of an experiment, one is interested not in the significance 
obtained from a single data set, but rather in the expected (more precisely, median) signifi- 
cance with which one would be able to reject different values of [i. Specifically, for the case 
of discovery one would like to know the median, under the assumption of the nominal signal 
model (n = 1), with which one would reject the background-only (/z = 0) hypothesis. And for 
the case of setting exclusion limits the sensitivity is characterized by the median significance, 
assuming data generated using the [i = hypothesis, with which one rejects a nonzero value 
of fi (usually \x = 1 is of greatest interest). 

The sensitivity of an experiment is illustrated in Fig. [2j which shows the pdf for q^ 
assuming both a strength parameter fj, and also assuming a different value fjl '. The distribution 
ffanlfjf) is shifted to higher value of q^, corresponding on average to lower p-values. The 
sensitivity of an experiment can be characterized by giving the p-value corresponding to the 
median q^ assuming the alternative value //. As the p- value is a monotonic function of q^, 
this is equal to the median p- value assuming //. 

In the rest of this section we describe the ingredients needed to determine the experi- 
mental sensitivity (median discovery or exclusion significance). In Sec. 13.21 we introduced the 
Asimov data set, in which all statistical fluctuations are suppressed. This will lead directly 
to estimates of the experimental sensitivity (Sec. 14. ip as well as providing an alternative 
estimate of the standard deviation a of the estimator fi. In Sec. 14.21 we indicate how the 
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Figure 2: Illustration of the the p- 
value corresponding to the median 
of q m assuming a strength parame- 
ter fi' (see text). 



procedure can be extended to the case where several search channels are combined, and in 
Sec. 14.31 we describe how to give statistical error bands for the sensitivity. 



4.1 The median significance from Asimov values of the test statistic 

By using the Asimov data set one can easily obtain the median values of qo, q^ and q^, and 
these lead to simple expressions for the corresponding median significance. From Eqs. ()53|) . 
(|60p and (I68D one sees that the significance Z is a monotonic function of q, and therefore 
the median Z is simply given by the corresponding function of the median of q, which is 
approximated by its Asimov value. For discovery using go one wants the median discov- 
ery significance assuming a strength parameter // and for upper limits one is particularly 
interested in the median exclusion significance assuming fi' = 0, med[^|0]. For these one 
obtains 



medfZol//'] = ^/<?o,A , 
med[Z M |0] = y/q^K ■ 



(79) 
(80) 



When using q^ for establishing upper limits, the general expression for the exclusion 
significance Z^ is somewhat more complicated depending on //, but is in any case found by 
substituting the appropriate values of q^A and a a into Eq. ([68]) . For the usual case where one 
wants the median significance for \i assuming data distributed according to the background- 
only hypothesis (// = 0), Eq. (J68I) reduces in fact to a relation of the same form as Eq. (160 j) . 
and therefore one finds 



med^jO] = Jq^A 



(81) 



4.2 Combining multiple channels 

In many analyses, there can be several search channels which need to be combined. For 
each channel i there is a likelihood function Li(p, Oi), where Q% represents the set of nuisance 
parameters for the ith channel, some of which may be common between channels. Here 
the strength parameter [i is assumed to be the same for all channels. If the channels are 
statistically independent, as can usually be arranged, the full likelihood function is given by 
the product over all of the channels, 
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I(M,fl) = n%9i)> (82) 

i 

where 6 represents the complete set of all nuisance parameters. The profile likelihood ratio 
A(/i) is therefore 

x{) = iuheM . (83) 

Because the Asimov data contain no statistical fluctuations, one has jl = fi' for all chan- 
nels. Furthermore any common components of Oi are the same for all channels. Therefore 
when using the Asimov data corresponding to a strength parameter \j! one finds 

w = Kl^»' (84) 

where Aa^A 4 ) * s the profile likelihood ratio for the ith channel alone. 

Because of this, it is possible to determine the values of the profile likelihood ratio entering 
into (|84p separately for each channel, which simplifies greatly the task of estimating the 
median significance that would result from the full combination. It should be emphasized, 
however, that to find the discovery significance or exclusion limits determined from real data, 
one needs to construct the full likelihood function containing a single parameter fj,, and this 
must be used in a global fit to find the profile likelihood ratio. 

4.3 Expected statistical variation (error bands) 

By using the Asimov data set we can find the median, assuming some strength parameter jj 
of the significance for rejecting a hypothesized value fi. Even if the hypothesized value y! is 
correct, the actual data will contain statistical fluctuations and thus the observed significance 
is not in general equal to the median. 

For example, if the signal is in fact absent but the number of background events fluctuates 
upward, then the observed upper limit on the parameter \x will be weaker than the median 
assuming background only. It is useful to know by how much the significance is expected to 
vary, given the expected fluctuations in the data. As we have formulae for all of the relevant 
sampling distributions, we can also predict how the significance is expected to vary under 
assumption of a given signal strength. 

It is convenient to calculate error bands for the median significance corresponding to the 
±iW variation of jl. As jl is Gaussian distributed, these error bands on the significance are 
simply the quantiles that map onto the variation of jl of ±Na about (j! '. 

For the case of discovery, i.e., a test of \i = 0, one has from Eqs. (jTT|) and ([53]) that the 
significance Zq is 

jija ji > , 
Z ={ (85) 

/}<0. 
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Furthermore the median significance is found from Eq. (|79|) . so the significance values corre- 
sponding to // ± Na are therefore 



Z (n' + Na) = med[Z\fi'] + N , (86) 

Z (n' - Na) = max [med[Z|//] - N, 0] . (87) 

For the case of exclusion, when using both the statistic q^ as well as q^ one found the 
same expression for the upper limit at a confidence level of 1 — a, namely, Eq. (|6ip . Therefore 
the median upper limit assuming a strength parameter // is found simply by substituting 
this for /t, and the ±Na error bands are found similarly by substituting the corresponding 
values of fjf ± Na. That is, the median upper limit is 

med[/x U p|//] = fjf + a$>- 1 (l-a) , (88) 

and the ±Na error band is given by 

baad N<T = (i' + a(^~ 1 (l-a)±N). (89) 

The standard deviation a of fi can be obtained from the Asimov value of the test statistic q^ 
(or q^) using Eq. ([30]). 

5 Examples 

In this section we describe two examples, both of which are special cases of the generic analysis 
described in Section [2 Here one has a histogram n = (m, . . . , n^) for the main measurement 
where signal events could be present and one may have another histogram m = (mi, . . . , tum) 
as a control measurement, which helps constrain the nuisance parameters. In Section [5. II we 
treat the simple case where each of these two measurements consists of a single Poisson 
distributed value, i.e., the histograms each have a single bin. We refer to this as a "counting 
experiment" . In Section 15.21 we consider multiple bins for the main histogram, but without 
a control histogram; here the measured shape of the main histogram on either side of the 
signal peak is sufficient to constrain the background. We refer to this as a "shape analysis". 

5.1 Counting experiment 

Consider an experiment where one observes a number of events n, assumed to follow a Poisson 
distribution with an expectation value E[n] = fis + b. Here s represents the mean number 
of events from a signal model, which we take to be a known value; b is the expected number 
from background processes, and as usual [i is the strength parameter. 

We will treat b as a nuisance parameter whose value is constrained by a control mea- 
surement. This measurement is also a single Poisson distributed value m with mean value 
E[m] = rb. That is, rb plays the role of the function u for the single bin of the control 
histogram in Eq. ([5]). In a real analysis, the value of the scale factor r may have some uncer- 
tainty and could be itself treated as a nuisance parameter, but in this example we will take 
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its value to be known. Related aspects of this type of analysis have been discussed in the 
literature, where it is sometimes referred to as the "on-off problem" (see, e.g., [TT|[T3]). 

The data thus consist of two measured values: n and m. We have one parameter of 
interest, //, and one nuisance parameter, b. The likelihood function for \i and b is the product 
of two Poisson terms: 

n! ml 

To find the test statistics qo, q^ and q^, we require the ML estimators ft, b as well as the 
conditional ML estimator b for a specified \x. These are found to be 



(91) 
(92) 



A = 


n — m/r 
s 


b = 


m 




T 


b = 


n + m — (1 + t)[j,s 


2(1 +r) 



1/2 

(n + m — (1 + t)/j,s) 2 + 4(1 + T)m/j,s 

4(1 + r) 2 



(93) 



Given measured values n and m, the estimators from Eqs. (|91j) . (j92j) and (j93j) can be 
used in the likelihood function ([90]) to find the values of the test statistics qo, q^ and q^. By 
generating data values n and m by Monte Carlo we can compare the resulting distributions 
with the formulae from Section [3j 

The pdf f(qo\0), i.e., the distribution of qo for under the assumption of \x = 0, is shown 
in Fig. [3|a). The histograms show the result from Monte Carlo simulation based on several 
different values of the mean background b. The solid curve shows the prediction of Eq. (I49p , 
which is independent of the nuisance parameter b. The point at which one finds a significant 
departure between the histogram and the asymptotic formula occurs at increasingly large 
qo for increasing b. For b = 20 the agreement is already quite accurate past qo = 25, 
corresponding to a significance of Z = y/% = 5. Even for b = 2 there is good agreement out 
to qo sa 10. 

Figure E^b) shows distributions of qo assuming a strength parameter // equal to and 
1. The histograms show the Monte Carlo simulation of the corresponding distributions using 
the parameters s = 10, b = 10, r = 1. For the distribution /(go|l) from Eq. ([4*8]) . one requires 
the value of a, the standard deviation of fi assuming a strength parameter /x' = 1. Here this 
was determined from Eq. (|32p using the Asimov value go, A; i-e., the value obtained from the 
Asimov data set with n —> fjfs + b and m —> rb. 

We can investigate the accuracy of the approximations used by comparing the discovery 
significance for a given observed value of qo from the approximate formula with the exact 
significance determined using a Monte Carlo calculation. Figure H|a) shows the discovery 
significance that one finds from qo = 16. According to Eq. (j53() . this should give a nominal 
significance of Z = ^/qo = 4, indicated in the figure by the horizontal line. The points 
show the exact significance for different values of the expected number of background events 
b in the counting analysis with a scale factor r = 1. As can be seen, the approximation 
underestimates the significance for very low b, but achieves an accuracy of better than 10% 
for b greater than around 4. It slightly overestimates for b greater than around 5. This 
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Figure 3: (a) The pdf f(qo\0) for the counting experiment. The solid curve shows /(<?o|0) from 
Eq. (|49p and the histograms are from Monte Carlo using different values of b (see text), (b) The 
distributions f(qo\0) and /((?o|l) from both the asymptotic formulae and Monte Carlo simulation 
based on s = 10, b = 10, r = 1. 

phenomenon can be seen in the tail of f(qo\0) in Fig. [3^b), which uses b = 10. The accuracy 
then rapidly improves for increasing b. 
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Figure 4: (a) The discovery significance Zq obtained from Monte Carlo (points) corresponding to a 
nominal value Zq = y/qo = 4 (dashed line) as a function of the expected number of background events 
b, in the counting analysis with a scale factor r = 1. (b) The median of qo assuming data distributed 
according to the nominal signal hypothesis from Monte Carlo for different values of s and b (points) 
and the corresponding Asimov values (curves) . 

Figure[U(b) shows the median value of the statistic qo assuming data distributed according 
to the nominal signal hypothesis from Monte Carlo (points) and the value based on the Asimov 
data set as a function of b for different values of s, using a scale factor r = 1. One can see 
that the Asimov data set leads to an excellent approximation to the median, except at very 
low s and b. 

Figure [5|a) shows the distribution of the test statistic q\ for s = 6, b = 9, r = 1 for data 
corresponding to a strength parameter jj! = 1 and also // = 0. The vertical lines indicate the 
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Asimov values of q\ and q\ assuming a strength parameter // = 0. These lines correspond to 
estimates of the median values of the test statistics assuming // = 0. The areas under the 
curves f(q\\l) and /(<7i|l) to the right of this line give the median p-values. 



(a) 





(b) 



Figure 5: (a) The pdfs f(q\\l) and f(qi\0) for the counting experiment. The solid curves show the 
formulae from the text, and the histograms are from Monte Carlo using s = 6, b = 9, r = 1. (b) 
The same set of histograms with the alternative statistic q\ . The oscillatory structure evident in the 
histograms is a consequence of the discreteness of the data. The vertical line indicates the Asimov 
value of the test statistic corresponding to // = 0. 

For the example described above we can also find the distribution of the statistic q = 
—2ln(L s+ f,/Lb) as defined in Sec. 13.81 Figure [6] shows the distributions of q for the hypothesis 
of /j, = (background only) and /x = 1 (signal plus background) for the model described above 
using b = 20, s = 10 and r = 1. The histograms are from Monte Carlo, and the solid curves 
are the predictions of the asymptotic formulae given in Sec. 13.81 Also shown are the p- values 
for the background-only and signal-plus-background hypotheses corresponding to a possible 
observed value of the statistic q b s . 



0.15 - 




0.05 - 



Figure 6: The distribution of the statistic 
q = — 2 ln(L s+ i,/Z/f,) under the hypotheses 
of n = and /i = 1 (see text). 



5.1.1 Counting experiment with known b 

An important special case of the counting experiment above is where the mean background b 
is known with negligible uncertainty and can be treated as a constant. This would correspond 
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to having a very large value for the scale factor r. 

If we regard b as known, the data consist only of n and thus the likelihood function is 

m = Of£+&r e -G-*) t (94 ) 

n! 
The test statistic for discovery qo can be written 

-21b #& /}>0, 




(95) 

A<o, 

where p, = n — b. For sufficiently large 6 we can use the asymptotic formula (|53p for the 
significance, 



7 _ , y2(nlnf + 6-n) A > 0, 

^o = \/<7o = "( (96) 

/}<0. 

To approximate the median significance assuming the nominal signal hypothesis (// = 1) 
we replace n by the Asimov value s + 6 to obtain 



med[Z | 1] = V*7 = V 2 (( s + 6 ) ln ( 1 + s / 6 ) " s ) • ( 97 ) 

Expanding the logarithm in s/b one finds 

med[Z |l] = 4= (1 + 0(s/b)) . (98) 

Vb 

Although Zq ~ s/yb has been widely used for cases where s + b is large, one sees here that 
this final approximation is strictly valid only for s <C b. 

Median values, assuming /i = 1, of Zq for different values of s and b are shown in Fig. [71 
The solid curve shows Eq. (J9"7j) . the dashed curve gives the approximation s/yb, and the 
points are the exact median values from Monte Carlo. The structure seen in the points 
is due to the discrete nature of the data. One sees that Eq. (|97p provides a much better 
approximation to the true median than does s/yb in regions where s/b cannot be regarded 
as small. 

5.2 Shape Analysis 

As a second example we consider the case where one is searching for a peak in an invari- 
ant mass distribution. The main histogram n = (ni,...,njv) for background is shown in 
Fig. El which is here taken to be a Rayleigh distribution. The signal is modeled as a Gaus- 
sian of known width and mass (position) . In this example there is no subsidiary histogram 
(mi,...,m M )- 

If, as is often the case, the position of the peak is not known a priori, then one will test all 
masses in a given range, and appearance of a signal-like peak anywhere could lead to rejection 
of the background-only hypothesis. In such an analysis, however, the discovery significance 
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Figure 7: The median, assuming (J, — 1, 
of the discovery significance Zq for different 
values of s and b (see text). 



10 20 30 40 50 60 70 80 90 100 



Figure 8: The background mass 
distribution for the shape analysis 
(see text). 



must take into account the fact that a fluctuation could occur at any mass within the range. 
This is often referred to as the "look-elsewhere effect" , and is discussed further in Ref . |14j . 

In the example presented here, however, we will test all values of the mass and \jl using the 
statistic q^ for purposes of setting an upper limit on the signal strength. Here, each hypothesis 
of mass and signal strength is in effect tested individually, and thus the look-elsewhere effect 
does not come into play. 

We assume that the signal and background distributions are known up to a scale factor. 
For the signal, this factor corresponds to the usual strength parameter //; for the background, 
we introduce an analogous factor 9. That is, the mean value of the number of events in the 
ith bin is E[rii] = /u,Si + bi, where /x is the signal strength parameter and the Sj are taken as 
known. We assume that the background terms bi can be expressed as b% = ^/b,«, where the 
probability to find a background event in bin i, /^ j, is known, and 9 is a nuisance parameter 
that gives the total expected number of background events. Therefore the likelihood function 
can be written 



L(H, 



-pr (vsi + 



lb A 



-(jj.si+ef bti ) 



ri: 



(99) 



For a given data set n = (m, . . . , njv) one can evaluate the likelihood (|99p and from this 
determine any of the test statistics discussed previously. Here we concentrate on the statistic 
q^ used to set an upper limit on //, and compare the distribution /(^|/i') from Eq.(ll8~|) with 
histograms generated by Monte Carlo. Figure [9] shows f(q^\0) (red) and f{q^\^) (blue). 
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Figure 9: The distributions 
/(g„|0) (red) and /(g M ]/x) (blue) 
from both the asymptotic formulae 
and Monte Carlo histograms (see 
text). 



The vertical line in Fig. [9] gives the median value of q^ assuming a strength parameter 
// = 0. The area to the right of this line under the curve of f{q ll \^) gives the p- value of 
the hypothesized //, as shown shaded in green. The upper limit on /i at a confidence level 
CL = 1 — a is the value of /i for which the p- value is p^ = a. Figure [9] shows the distributions 
for the value of \x that gave p^ = 0.05, corresponding to the 95% CL upper limit. 

In addition to reporting the median limit, one would like to know how much it would vary 
for given statistical fluctuations in the data. This is illustrated in Fig. [TO] which shows the 
same distributions as in Figure[9l but here the vertical line indicates the 15.87% quantile of the 
distribution f(q^\0), corresponding to having /} fluctuate downward one standard deviation 
below its median. 




Figure 10: The distributions 
/(g M |0) (red) and f(q^\n) (blue) as 
in Fig.[9]and the 15.87% quantile of 
/(g M |0) (see text). 



By simulating the experiment many times with Monte Carlo, we can obtain a histogram 
of the upper limits on /j, at 95% CL, as shown in Fig. [Til The drier (green) and ±2<r (yellow) 
error bands are obtained from the MC experiments. The vertical lines indicate the error 
bands as estimated directly (without Monte Carlo) using Eqs. ([88]) and (f89|) . As can be seen 
from the plot, the agreement between the formulae and MC predictions is excellent. 

Figures [9] through [11] correspond to finding upper limit on \x for a specific value of the peak 
position (mass). In a search for a signal of unknown mass, the procedure would be repeated 
for all masses (in practice in small steps). Figure [12] shows the median upper limit at 95% CL 
as a function of mass. The median (central blue line) and error bands (zblcr in green, ±2<r in 
yellow) are obtained using Eqs. (I88p and (I89p . The points and connecting curve correspond 
to the upper limit from a single arbitrary Monte Carlo data set, generated according to the 
background-only hypothesis. As can be seen, most of the plots lie as expected within the 
±lcr error band. 
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Figure 11: Distribution of the 
upper limit on /x at 95% CL, as- 
suming data corresponding to the 
background-only hypothesis (see 
text). 



Figure 12: The median (central 
blue line) and error bands (±1<j in 
green, ±2<r in yellow) for the 95% 
CL upper limit on the strength pa- 
rameter fi (see text). 



6 Implementation in RooStats 



Many of the results presented above are implemented or are being implemented in the 
RooStats framework [15] . which is a C++ class library based on the ROOT [16] and RooFit [17] 
packages. The tools in RooStats can be used to represent arbitrary probability density func- 
tions that inherit from RooAbsPdf , the abstract interfaces for probability density functions 
provided by RooFit. 

The framework provides an interface with minimization packages such as Minuit |18| . 
This allows one to obtain the estimators required in the the profile likelihood ratio: ft, 

9, and 6. The Asimov dataset defined in Eq. (|24|) can be determined for a probability 
density function by specifying the ExpectedDataO command argument in a call to the 
generateBinned method. The Asimov data together with the standard HESSE covariance 
matrix provided by Minuit makes it is possible to determine the Fisher information matrix 
shown in Eq. (j28[) . and thus obtain the related quantities such as the variance of ft and the 
noncentrality parameter A, which enter into the formulae for a number of the distributions 
of the test statistics presented above. 

The distributions of the various test statistics and the related formulae for p- values, sensi- 
tivities and confidence intervals as given in Sections [21 [3] and |4] are being incorporated as well. 
RooStats currently includes the test statistics t^, t^, qo, and q,q^, and q^ as concrete imple- 
mentations of the TestStatistic interface. Together with the Asimov data, this provides 
the ability to calculate the alternative estimate, a a, for the variance of /t shown in Eq. (]30p . 
The noncentral chi-square distribution is being incorporated into both RooStats and ROOT's 
mathematics libraries for more general use. The various transformations of the noncentral 
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chi-square used to obtain Eqs. (f33j) . ((4"1~|) . ((151) . ((551) . and ((63|) are also in development in the 
form of concrete implementations of the SamplingDistribution interface. Together, these 
new classes will allow one to reproduce the examples shown in Section [5] and to extend them 
to an arbitrary model within the RooStats framework. 



7 Conclusions 

Statistical tests are described for use in planning and carrying out a search for new phe- 
nomena. The formalism allows for the treatment of systematic uncertainties through use of 
the profile likelihood ratio. Here a systematic uncertainty is included to the extent that the 
model includes a sufficient number of nuisance parameters so that for at least some point in 
its parameter space it can be regarded as true. 

Approximate formulae are given for the distributions of test statistics used to characterize 
the level of agreement between the data and the hypothesis being tested, as well as the related 
expressions for p-values and significances. The statistics are based on the profile likelihood 
ratio and can be used for a two-sided test of a strength parameter \x (t^), a one-sided test for 
discovery (go), and a one-sided test for finding an upper limit (q^ and q^). The statistic t^ 
can be used to obtain a "unified" confidence interval, in the sense that it is one- or two-sided 
depending on the data outcome. 

Formulae are also given that allow one to characterize the sensitivity of a planned exper- 
iment through the median significance of a given hypothesis under assumption of a different 
one, e.g., median significance with which one would reject the background-only hypothesis 
under assumption of a certain signal model. These exploit the use of an artificial data set, 
the "Asimov" data set, defined so as to make estimators for all parameters equal to their true 
values. Methods for finding the expected statistical variation in the sensitivity (error bands) 
are also given. 

These tools free one from the need to carry out lengthy Monte Carlo calculations, which 
in the case of a discovery at 5<r significance could require simulation of around 10 8 measure- 
ments. They are are particularly useful in cases where one needs to estimate experimental 
sensitivities for many points in a multidimensional parameter space (e.g., for models such as 
supersymmetry), which would require generating a large MC sample for each point. 

The approximations used are valid in the limit of a large data sample. Tests with Monte 
Carlo indicate, however, that the formulae are in fact reasonably accurate even for fairly small 
samples, and thus can have a wide range of practical applicability. For very small samples 
and in cases where high accuracy is crucial, one is always free to validate the approximations 
with Monte Carlo. 
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